THE RELATION BETWEEN APPROXIMATION IN DISTRIBUTION 
AND SHADOWING IN MOLECULAR DYNAMICS* 

PAUL TUPPERt 

Abstract. Molecular dynamics refers to the computer simulation of a material at the atomic 
level. An open problem in numerical analysis is to explain the apparent reliability of molecular 
dynamics simulations. The difficulty is that individual trajectories computed in molecular dynamics 
are accurate for only short time intervals, whereas apparently reliable information can be extracted 
from very long-time simulations. It has been conjectured that long molecular dynamics trajectories 
have low-dimensional statistical features that accurately approximate those of the original system. 
Another conjecture is that numerical trajectories satisfy the shadowing property: that they are close 
over long time intervals to exact trajectories but with different initial conditions. We prove that these 
two views are actually equivalent to each other, after we suitably modify the concept of shadowing. 
A key ingredient of our result is a general theorem that allows us to take random elements of a 
metric space that are close in distribution and embed them in the same probability space so that 
they are close in a strong sense. This result is similar to the Strassen-Dudley Theorem except that a 
mapping is provided between the two random elements. Our results on shadowing are motivated by 
molecular dynamics but apply to the approximation of any dynamical system when initial conditions 
are selected according to a probability measure. 
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1. Introduction. In the form we consider here, molecular dynamics consists 
of modeling an atomistic system with a system of Hamiltonian ordinary differential 
equations that are numerically integrated. For given initial conditions information of 
physical interest is extracted from the resulting approximate trajectories [TTl [Tl [20]. 
Despite the scientific importance of molecular dynamics there is very little rigorous 
justification of the results it produces. The problem is that individual trajectories 
computed by molecular dynamics simulations are accurate for only short time in- 
tervals. Numerical trajectories diverge rapidly from true trajectories given the step- 
lengths used in practice. This phenomenon is well-known but is not considered a 
short-coming of molecular dynamics in practical terms [IH p. 81]. Experience has 
suggested that the features of trajectories that researchers wish to study are com- 
puted accurately. However, that trajectories are reliable in this sense has yet to be 
rigorously demonstrated in representative cases. 

Two proposals have emerged to explain the success of molecular dynamics given 
the inaccuracy of the computed trajectories |28| . The first we refer to as approximation 
in distribution and the second as shadowing. 

The idea of approximation in distribution is to view both exact trajectories and 
numerical trajectories as stochastic processes. This is done by drawing initial condi- 
tions from a physically appropriate probability distribution rather than considering a 
single fixed initial condition. Both the resulting numerical trajectory and the resulting 
exact trajectory are then random. The proposal is that in some distributional (statis- 
tical) sense the numerical trajectories and exact trajectories are close to each other. 
This means roughly that the probability of some event happening for the numerical 
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trajectory is close to that of the same event happening for the exact trajectory. Put 
in this way, approximation in distribution may not hold for arbitrary events which 
depend on the position of every single atom in the system. However, usually we are 
only interested in low-dimensional functions of the state of the full system. An exam- 
ple we shall consider in the following is when one is only interested in the position of 
a single particle in a system consisting of many particles. It may be that for systems 
studied in molecular dynamics statistical features of trajectories of single particles 
are reproduced accurately in simulations. Approximation in distribution for low di- 
mensional functions of numerical trajectories has been studied for model systems in 
[3 114| I32j using a combination of analysis and computational experiments, though it 
has not been established for more realistic systems. 

The idea of shadowing is to show that, even though a numerical trajectory di- 
verges rapidly from the corresponding exact trajectory, it may be possible to show 
that the numerical trajectory is close to another exact trajectory with different initial 
conditions. If shadowing holds, then numerical trajectories from molecular dynamics 
simulations can be viewed as real trajectories with some small observational error. 
Shadowing has been established for various types of dynamical systems with uniform 
hyperbolicity properties [22j and these ideas have been applied to Hamiltonian dy- 
namical system such as those studied in molecular dynamics ^ . However, shadowing 
over arbitrarily long time intervals is probably not possible for realistic Hamiltonian 
systems [12]. Moreover, shadowing as it is usually defined does not guarantee that 
statistical features of numerical trajectories match those of the exact trajectories [TS]. 
We will discuss these issues further and show how to modify the concept of shadowing 
suitably so that it is applicable to our case and to other situations where the initial 
conditions of the dynamical system are distributed according to some probability 
measure. 

The main purpose of this paper is to carefully define and quantify these two 
concepts, approximation in distribution and shadowing, in the context of molecular 
dynamics and to explain the relation between them. Our main result shows that when 
the two concepts are formalized and suitably modified they are actually equivalent. 

In Section [5] we introduce a model system for molecular dynamics and present 
the results of some numerical experiments performed with it. We first demonstrate 
that numerical trajectories using practical time steps diverge rapidly from exact tra- 
jectories. We then provide evidence that statistical features of some low-dimensional 
functions of the trajectories are nevertheless reliable. 

In Section [3] we discuss approximation in distribution and shadowing in detail 
and give quantitative versions of each idea. In particular we show how to adapt the 
idea of shadowing to situations where initial conditions are distributed according to 
a probability measure. Our concept, which is a modification of the usual notion of 
shadowing for dynamical systems, we call Weak Shadowing. 

In Section |4] we prove our main result, Theorem ll.il which we state here. In what 
follows, the space (C[0,T])'" is the set of all continuous trajectories on [0,T] taking 
values in M™. For x,y £ (C[0, T])™, we define ||a; - y||oo = supj^jQ ^j \x{t) - y(t)\. 
We let H be a map M™ -> R'' and we define H(a;) £ (C[0,T])'= by Il{x){t) = Il{x{t)) 
for any x G (C[0,T])™. We let Xo be a random initial condition in R™ and then 
denote by X the random member of (C[0, T])™ starting at Xq and generated by the 
differential equations. When we use a numerical method to generate an approximate 
solution to the differential equations at a sequence of points, we use Xat to denote 
the random element of (C[0,T])'" generated by its linear interpolation. Finally, we 
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denote by p the well-known Prokhorov metric on random elements of metric spaces 
which we will define in Subsection 13.11 It metrizes convergence in distribution so 
that if two random elements of a metric space are close according to p they have 
approximately the same distribution. 

Theorem 1.1. Let he a random vector in R™ such that P(Xo ^ x) = 
for any x £ R™. Let X be the random trajectory in (C[0,T])™ generated by a 
system of differential equations starting from Xq . Let XAt be the random trajectory 
of (C[0, r])™ generated by a numerical method starting at Xq. Let 11 : R™ SJ' be 
a map. Then the following are equivalent for all e > 0; 

(A) Approximation in distribution. 

p{U{X), At)) < e. 

(B) Weak Shadowing. There is a map S^t ■ ^ such that Yq ~ S^iXo has 
the same probability distribution as Xq and ifY is the random member of (C[0, T])'" 
starting at Yq generated by the flow of the differential equations then 

p(||n(XAt)-n(r)|U >e) <e. 



Note that no assumptions are made about the differential equations that generate 
the exact trajectory X nor about the numerical method that generates the approx- 
imate trajectory Xa- The theorem just asserts that two ways in which a numerical 
method can be accurate, (A) and (B), are equivalent. The theorem does not assert 
that (A) or (B) holds for any particular system or any particular method. 

Showing that (B) implies (A) is straightforward, but the converse requires the 
result of Theorem 14.31 which is a version of the Strassen-Dudley theorem [3D] , [HI 
§11.6]. The original Strassen-Dudley Theorem shows that two random variables that 
are close with respect to the Prokhorov metric p can be embedded in a new probability 
space where they are close in a strong sense. Our contribution is to show how to do 
this with one random variable defined as a function of the other. The only important 
extra assumption needed is that the measures induced by the random variables be 
non- atomic, which means that they assign zero measure to any point. 

Finally, in Section Owe conclude with a discussion of what our result suggests for 
the numerical analysis of molecular dynamics. 

2. Numerical Experiments. We consider a system of n = 100 point particles 
interacting on an 11.5 by 11.5 square periodic domain. We let q G T^" and p £ K^" 
denote the positions and velocities of the particles, with qi £ '^'^,Pi £ R^ denoting 
the position and velocity of particle i. The motion of the particles is described by a 
system of Hamiltonian differential equations: 

dq dH dp dH 



with Hamiltonian 



dt dp ' dt dq' ^^'"^^ 

H{q,p) = l\\p\\l + Y.VLAU-q,\\). 

i<j 

Here Vlj denotes the famous Lennard-Jones potential TT \ In our simulations we use 
a truncated but infinitely smooth version p. 2409]: 



Vi,/(r) = 



1 _ 1 

0, otherwise. 



4 (ttt - exp[(r - r,„t„ff) if r < r„t„ff. 
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We set r„„toff = 2.5. 

For our first numerical experiments we take our initial conditions q{0),p{0) to be 
randomly distributed according to the probability density function 

Ce-/3ff(9,p)^ (2.2) 

where C is chosen so that J C'e~^^'^'''P^ dqdp = 1. The probability distribution with 
this density function is known as the canonical distribution for the system with Hamil- 
tonian H at temperature /3^^. It is intended to model the equilibrium distribution 
of the system when it is in thermal contact with an environment of temperature l3~^ 
[M^, Sec. 6.2]. For our experiments we fixed (3=1 and generate (g(0),p(0)) according 
to (|2.2p using Langevin dynamics [^. We then subtract a constant vector from the 
velocities of all the particles so that center of mass of the system has zero velocity. 
The canonical distribution for any (3 > with this adjustment is invariant with re- 
spect to the dynamics described by (|2.ip . Later in this section we will perform further 
experiments with a nonequilibrium distribution on the initial conditions. 

We numerically integrate (|2.ip using the Stormer-Verlet scheme, which is an ex- 
plicit second-order method for our system |13| . It is the standard numerical integrator 
used in molecular dynamics [HI p. 69]. Given an initial {q°,p°) = (g(0),p(0)) and a 
At > 0, the Stormer-Verlet scheme generates a sequence of states n> such 

that ((?"■, p") « {q{nAt),p{nAt)). The version of the algorithm we use is 

(7"+p"At/2, 

A practical steplength for simulations of our system with the Stormer-Verlet method 
is At = 0.01. This choice of At is close to the largest for which the system can be 
integrated without an explosive instability in energy on the interval [0, 1000] for the 
initial conditions we consider. 

In the introduction we mentioned that researchers only consider low dimensional 
information from a molecular dynamics simulations. For our numerical experiments 
we will consider the configuration over time of the first particle: qi{t) G T^, t £ [0, T]. 
For the purposes of our experiments it helps to view the motion of the particle as 
occurring in and starting at the origin. To this end, for the exact trajectory we 
define 

Q{t)= f piis)ds, 
Jo 

and let Qx{t) and Qy{t) denote the respective x and y coordinates. We have that 
(9a:(0) = Qy{Q) = and if we let t vary in [0,T] then Q e {C[Q,T\f. Similarly, for 
the numerical trajectory for each At we define 

n-l 

We then define Qa* to be the linear interpolation the the Q" at the times nAt so 
that QAt e (C[0,T])2. 

Our first set of numerical experiments demonstrates the qualitative features of the 
trajectories Q over the time interval [0, 20]. We select three random initial conditions 
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Fig. 2.1. Krsi sei of experiments. Plots of realizations ofQ^^ ,^ vs. Q/\t,y for three different 
initial conditions over the time interval [0,20] with At = 0.01. The initial position is designated by 
a circle and the final position by an X. 



from the distribution given by (12.21) and plot in Figure 12.11 the resulting numerical 
trajectories when At = 0.01. At the scale shown here there was not a noticeable qual- 
itative difference between these plots and the similar plots generated with a smaller 
At. The motion is highly irregular, looking somewhat like Brownian motion in R^. 
However, unlike Brownian motion, the exact trajectories Q are infinitely smooth. The 
interpolated numerical approximations Qa* are piecewise linear. 



Our second set of experiments demonstrates that individual trajectories computed 
using the timestep At = 0.01 are not accurate over time scales of interest. We 
randomly generate one initial condition according to the canonical distribution and 
then simulate over [0,10] for At = 0.01,0.005,0.0025. In Figure [2J]we plot QAt,x{t) 
versus t for each of these steplengths. If the trajectory computed with steplength 
At = 0.01 is accurate over the time interval [0, 10], we expect that reducing the time 
step by a factor of two would not yield a significantly different curve. However, we see 
that the two curves for At = 0.01 and At = 0.005 very quickly diverge. Moreover, we 
see that the trajectory with At = 0.005 is not accurate either, since it diverges quickly 
from the trajectory with timestep At — 0.0025. Obtaining an accurate trajectory 
over the interval [0, 10] and certainly over [0, 100] would require At to be considerably 
smaller than what is used in practice. This same convergence behaviour is observed 
for all initial conditions selected according to the canonical distribution. 



Our third set of experiments shows that despite the inaccuracy of individual sim- 
ulations of the system, the statistical features of numerical trajectories appear to be 
highly accurate even for At — 0.01. Again we consider the trajectory of a single 
particle for initial conditions drawn from the distribution defined by (|2.2p . For each 
randomly generated initial condition we compute the value of a collection of function- 
als of the numerical trajectories over the time intervals [0, 100] and [0, 1000]. We plot 
these values in histograms in order to observe the distribution of each functional. We 
do this for each of At = 0.01,0.005,0.0025 and for five functionals of the trajectory. 
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Fig. 2.2. Second set of experiments. Computed QAt{t) versus t for fixed initial conditions for 
a range of At. 



The functionals Fi : (K[0, T])^ ^ R we consider are 

^ [ Q,{t) sm{2Trt /T)dt, 
^ Jo 

max ||Q(t)||, 
te[Q,T] 

mm{te[0,T]: \\Q{t)\\ > 1}, 

(Q(r) - QjT ~ T)f{Q{T - r) - Q{T - 2t)) ^ 
\\Q{T) - Q{T ~ t)\\ \\Q{T - r) - Q{T - 2r)|| ' 

Fi is simply the x-position of the particle at time T. F2 is the average of Qx{t) sin(27ri/T) 
over [0,T]. F^ is the maximum distance from its initial condition that the particle 
attains on [0,r]. F4 is the first time at which the particle leaves a ball of radius 1 
centred at its initial condition. Its value was set to T if the particle did not leave 
within [0, T]. F^ is the cosine of the angle between two adjacent increments of Q just 
before time T. 

In Figure 12.31 we show histograms of Fi{Q^t) for i — 1, . . . , 5 with the different 
values of At over the time interval [0, 100]. We also show the analogous histograms 
for two-dimensional Brownian motion B{t) scaled so that Bx{t) and QAt,x have the 
same variance. We see that for all five functions the histogram generated does not 
appear to depend on the steplength used. As well, we see that for some of the Fi this 
matches closely the same histogram for Brownian motion, whereas for others it does 
not. Note that we do not expect the histograms of Q^t to converge to those for B: 
the exact trajectory Q and Brownian motion B do not have the same distribution. 
These results are duplicated in Figure 12.41 where we show the analogous plots for the 
time interval [0, 1000]. 



FiiQ) = 

F2{Q) = 

Fz{Q) = 
FiiQ) = 
F^iQ) = 
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Finally, in our fourth numerical experiment we repeat the previous experiment but 
we start with initial conditions that are drawn from a non-equilibrium distribution. 
We randomly generate the initial conditions by first drawing from the equilibrium 
distribution (|2.2p as before. We then add 10 to the velocity in the x direction of the 
first particle. The typical trajectory arising from an initial condition selected in this 
way involves the first particle rapidly losing its excess energy through collisions with 
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Fig. 2.4. Third set of experiments for the time interval [0,1000]. Histograms of Fi{Q/^t) 
for i = 1,...,5. Each plot shows the result for the numerical trajectory with At = 0.01 (solid), 
At = 0.005 (dashed), At = 0.0025 (dash-dot), as well as for Brownian motion (grey). 

the other particles. Within 10 time units the system is effectively indistinguishable 
from one started in the equilibrium state. As before, we generate histograms of the 
functions Fi, . . . ,F^, but now over the time interval [0, 10] in order to highlight the 
effects of the nonequilibrium initial conditions. Figure [2751 shows the results of these 
simulations. 

For both the simulations from equilibrium and from non-equilibrium initial con- 
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Fig. 2.5. Fourth set of experiments. Using initial conditions drawn from a non- equilibrium 
distribution over the time interval [0, 10]. Histograms of Fi{Q/\t) for i = 1, . . . , 5. Each plot shows 
the result for the numerical trajectory with At = 0.01 (solid), At = 0.005 (dashed), At = 0.0025 
(dash-dot). 

ditions the histograms for all the functionals Fi, . . . , F5 are virtually identical for the 
different step-lengths, in contrast to the case where we examined single trajectories. 
Any differences are well within the statistical error due to sampling only a finite 
number of trajectories. This suggests that computed distributions of the functionals 
with At = 0.01 are fairly accurate for the distributions on the initial conditions we 
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consider. The non-rigorous argument for this behef is is as foUows. If the histograms 
were not accurate for the step length we were using, then reducing the steplength 
would cause the histogram to move significantly closer to the histogram for the exact 
solution. Thus the histogram would not stay the same after halving the steplength. 
The defect in this argument is that there could in theory be a broad range of values of 
At for which the apparently same wrong histogram is computed. An approximately 
correct histogram could be only observed for a At much small than what we use. 
Despite this possibility, researchers generally trust histograms and other statistical 
information extracted from molecular dynamics trajectories 

3. Two Approaches. Here we review two different proposals for the success 
of molecular dynamics: approximation in distribution and weak shadowing. For a 
distinct but related perspective, see p7] . 

In the following (C[0,T])" denotes functions x: [0,T] M™. We use Xq to 
denote a random initial condition in M™, X G (R[0, T])™ to denote the random exact 
trajectory of a dynamical system starting at Xq, and Xa* £ (C[0, T])™ the random 
approximate trajectory with the same initial condition. We imagine the approximate 
trajectory to be generated by using a numerical method with steplength At and then 
linearly interpolating the result. We measure the distance between two members x, y 
of (R[0,T])'" by d{x,y) = \\x - y|U = sup,g[o^T] " y(y)l- Let H: R" ^ R'^ be 
a map that extracts some low dimensional information from the system, so that the 
resulting low dimensional trajectories in (C[0,T])'^ are n(X) and n(XAt). 

3.1. Approximation in Distribution. One explanation for the reliability of 
molecular dynamics is that if we let the initial condition of a simulation be random, 
then the distribution of the resulting numerical trajectory, seen as a random path 
in (C[0, r])™, is close to the distribution of the actual trajectory. We say that the 
trajectory is approximated in distribution. This is also known as weak approximation. 
Here we review some of the basic facts of approximation in distribution [3] . 

Given a random variable X taking values in R, its distribution is the probability 
measure on (R, ;B) defined by fi{A) = F{X e A) for A e B. Here B is the Borel 
subsets of R. We say that two random variables X and Y have the same distribution 
if F{X £ A) ^ ¥(Y £ A) for aU AeB. This is equivalent to Ef{X) = Ef{Y) for all 
measurable /. Note that two random variables need not be close on a realization- by- 
realization basis in order for their distributions to be identical. 

The concept of distribution extends naturally to random vectors taking values 
in R™ and indeed to random elements of any metric space as follows. Consider a 
metric space {S, d) with metric d and let B be the Borel subsets of S induced by d. 
A random element of iS is a measurable function X : Vl S where (ri,.F, P) is some 
probability space. The distribution of X on 5 is the probability measure given by 
IJl{A) = P(X € A). As in the case of random variables, two random elements X and 
Y oi S can have the same distribution without X{ljj) and Y{ljj) being close for any 
fixed a; G f2. 

Suppose we want to quantify how close the distributions of two random elements 
of a metric space are to each other. A natural way to do this is to define a metric on 
the space of random elements of a metric space. One popular choice is the Prokhorov 
metric, p, which we define here. It has the property that if p{X^ Y) ~ Q for random 
elements X and Y if and only if X and Y have identical distributions. 

For a set A C 5 and e > we define A^, the set of all points within distance e of 
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A by 



{xeSl inf d(x, y) <e}^{xeS\ d(x, A) < e}. (3.1) 

yeA 



The Prokhorov metric is defined as follows. 

Definition 3.1. p. 72] For random variables X and Y in S 

p{X, Y) := inf {e > I ¥{X G A) < V{Y £ A^) + e} . 



If we identify random elements of S that have the same distribution, then p is 
a metric on the set of random elements j9, p. 394]. If {S,d) is separable (as are all 
examples in this paper) random elements Xn converge in distribution to X if and 
only if p{Xn,X) ^ p. 395]. Note that p{X,Y) < 1 always. 

A straightforward way to measure how close X is to Xai in distribution is to 
consider p{X,XAt), where we view X and X^t as random elements of (C[0, T])™. 
However, we generalize this idea by measuring how close the distributions of low 
dimensional functions of the full trajectories are. We consider the map 11: M™ R'"', 
and we measure p{Il{X),Il{XAt)), where Ii{X) and Il{XAt) are random elements 
of (C[0, r])'"'. Choosing k = m and 11 to be the identity gives the approximation 
in distribution of X^t to X so this is a generalization of the original idea. As an 
example, suppose the differential equations defining X describe the motion of a system 
of particles in M^, so that the dimension of the system is An. To study the position 
of one particle as a function of time, one could let 11: R**" — » R^ be the function that 
simply returns the position of the first particle in the system. This choice in analogous 
to what we did for our model system in Section [2l 

With these definitions in mind, we are led to a quantification of our belief that 
n(Ar) and Il{XAt) are close in distribution: we conjecture that 

p(n(X),n(XAt)) <e, (3.2) 

for some small e. For the conjecture to be applicable to molecular dynamics, we must 
be able to control e and the length of the time interval T in terms of At. We conjecture 
that for all sufhciently small At there are some C, D, E,p > 

p{n{x),n{XAt))<CAt'', 

for T < Dexp{-E/At). 

One of the consequences of approximation in distribution with respect to the 
Prokhorov metric is that it allows us to bound the error we make in computing the 
expectation of functionals of the paths. Suppose that G : (C[0, T])'^ — *■ R is a bounded 
Lipschitz continuous function of the paths. Let the norm || • \\bl be defined on the 
set of all such G by 

\\G\\bl sup \Gix)\ + sup H^^^-^, 
X x^y If y||oo 

where the suprema are taken over all x,y € (C[0,r])'^ [9j p. 390]. We can define a 
another metric on the space of random elements of {C[0,T])^ by 

13{X,Y):= sup |EG(X) -EG(r)|, 

l|G||Bi<l 
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originally defined in [TD]. For any two random elements X and F of a metric space, 
we have p. 398] 

/3 

p{X,Y)<^-f3{X,Y)^ , (3.3) 

and P p. 411] 

PiX,Y)<2p{X,Y). 

So ([321) would imply l3{U{X),n{XAt) < 2e, and so 

\EG{U(X)) - EG(n(XAt))| < 2||G||BLe. (3.4) 

Of course, most functions G of interest are not bounded, but similar results can be 
obtained for unbounded, locally Lipschitz G in the case that G{Il{X j) and G(n{XAt)) 
have finite moments; see [8] for an example. 

3.2. Weak Shadowing. As we have discussed, numerical trajectories and ex- 
act trajectories started at the same initial condition typically rapidly diverge. One 
idea that has been proposed is that for every numerical trajectory there is an exact 
trajectory starting at a different initial condition that stays close to the numerical 
trajectory over long time intervals. This idea is known as shadowing. An early ver- 
sion of shadowing is described by Bowen [5] for Axiom A systems, though see [2], [H 
p. 381], and [521 P- 38] for earlier descriptions. A general result in this area is that if 
a system satisfies a uniform hyperbolicity condition, then shadowing is possible over 
infinite time intervals [22] . This fact was used in [23] to study the long-time averages 
over trajectories computed with a symplectic method under the assumption that the 
Poincare section of the flow is uniformly hyperbolic. 

However, many systems that arise in applications are not uniformly hyperbolic [181 
Appendix B] . To the best of our knowledge, the only example of a physically realizable 
Hamiltonian system that is uniformly hyperbolic on one of its energy levels is due to 
Hunt and Mackey [H], and this system is uncharacteristic of systems that arise in 
molecular dynamics. (Many billiard systems have been shown to be ergodic and even 
mixing, but fail to be uniformly hyperbolic because the vector field is discontinuous at 
bounces [181 Appendix B].) For more realistic systems shadowing has been numerically 
demonstrated over long but finite time intervals [12| . [16| . It remains to be seen 
whether shadowing over the long trajectories computed in molecular dynamics is 
possible. 

Let us specify formally what shadowing would consist of in our situation. Fixing 
a time interval [0, T] we say that Y, an actual trajectory of the system, e-shadows the 
numerical trajectory X^t if 

\\Y - XAtWoo < e. 

Assuming that it is possible to shadow every numerical trajectory in this way, let SAt 
be the map that takes the initial condition of the numerical trajectory XAt to the 
initial condition of the exact trajectory Y that shadows XAt- This gives us our first 
version of shadowing. 

Shadowing Version 1. There exists SAt - ^ IR™ such that ifYo — SAtXo 
then 

\\Y - XAtWoo < e- 
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This version of shadowing is not sufficient for our purposes: it is not strong 
enough to show something Hke (|3.4p . The difficulty is that even though each numerical 
trajectory is close to some exact trajectory, it could be that the distribution of Y is 
completely different from that of X. This can result even if the distribution of Xq 
and Yq are close because of the rapid divergence of trajectories of the system. This 
would in turn imply, since Xai is close to Y for every Xq, that Xai is not close to X 
in distribution. 

In order for the statistical properties of Y to be similar to that of X, it is neces- 
sary that the map SAt preserves the measure on the initial condition Xq . We are lead 
to the following concept of shadowing for initial value problems with a probability 
measure on the initial conditions: 

Shadowing Version 2. Given that Xq has distribution v on R™ them is a map 
Sai on R™ such that SaiXq also has distribution v and 

\\Y^XAt\\oo<e. 

In practice we want to allow the possibility that shadowing is not possible for 
some initial conditions. So to weaken Shadowing Version 2 slightly we conjecture: 

Shadowing Version 3. Given that Xq has distribution v on R™ there is a map 
SAt on R™ such that SaiXq also has distribution v and 

P.(||r-XAt||oo > e) < e. 

With our application in mind there is an important way we can further weaken 
Shadowing Version 3. As in the previous subsection, let H : R" be a map that 

takes configurations of the whole system and extracts lower dimensional information. 
Then we could conjecture: 

Shadowing Version 4. Given that Xq has distribution v on R™ there is a map 
SAt on R™ such that SAtXQ also has distribution v and 

P,(||n(V)-n(XAt)||oo >e) <e. (3.5) 

This final form of shadowing is what we call weak shadowing. 

In the context of molecular dynamics, establishing weak shadowing for some phys- 
ically interesting 11 would be relevant if for all sufficiently small At there were con- 
stants A,B,C,p > such that dSl]) held with e = AM^ for all T = Bexp(-C/Ai). 
Note that all versions of shadowing presented here hold immediately for small e if T 
is fixed and At is allowed to be arbitrarily small. However, this limit is not of interest 
in molecular dynamics. 

In order for weak shadowing to have the same explanatory power as Approxima- 
tion in Distribution, we need to show that it also allows us to bound the difference 
between ¥,G{X) and ¥,G{XAt) for reasonable functions G, as in (|3.4p . To see that 
it does, again define || ■ \\bl as in the previous subsection and note that for any 
G: [C[Q,T]f ^ R, G is bounded by ||G||bl and G has |1G||bl as a Lipschitz con- 
stant. In that case: 

|EG(n(x)) -EG(n(XAt))| = |EG(n(y)) -EG(n(XAt))| 

< \\G\\BLe + 2||G||GLP(||n(y) - U{XAt)\\ > e) 
<3\\G\\BLe, 

where we have used the fact that X and Y have the same distribution in the first 
equality. This result is precisely (13. 4p with a different constant. 
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4. Proof of Main Result. Our main result, Theorem 1 1.1 1 in Section [TJ asserts 
that Approximation in Distribution (Equation (|3.2[) ') and Weak Shadowing (Equation 
(|3.5p ) are equivalent. In this section we first state a more general result, Theorem l4.1l 
and then show how Theorem 11.11 follows from it. Then we present the proof of Theo- 
rem |4T] which uses our main technical result. Theorem 14.31 

We say that a measure on a space S is non-atomic if ^{{x}) — for every point 
xeS. 

Theorem 4.1. Let R" ^ (C[0,T])'= be measurable maps and let v he a 

non-atomic measure on M™. Let Xq be a random vector in R*^ with distribution v. 
Then the following are equivalent for all e > 0.' 

(A) Approximation in distribution: 

p($(Xo),$(Xo)) <e. 

(B) Weak Shadowing: There is a map S: M™ — > R™ such that SXq also has 
distribution v and 

nmSXo)-HXa)\\^>e)<e. 
Proof. To show that (A) implies (B) it is only necessary to apply Theorem 14.31 with 

{s,d) = (r,e) - (E"M| • II), x = Y = Xo, (sj) = ((c[o,r])M| • lU), ni = $i, 

112 = $2 • The theorem then gives a map -0 : R™ M™ such that 4'Xo has the same 
distribution as Xq and 

P(||$(^(Xo)))-$(Xo)||oo >e) <e. 

To show that (B) implies (A), it is only necessary to see that (B) implies 

p($(5Ao),$(Ao)) <e 

Since SXq is equal in distribution to Xq and equality in distribution is preserved under 
measurable maps, ^{SXq) is equal to Il{Xo) in distribution. Since the Prokhorov 
metric p only depends on distributions, we have that p{^{Xo), ^{Xq)) < e as required. 
□ 

Proof of Theorem \l.l[ Let $ be the map that takes initial condition Xq to Il{X). 
Let be the map that takes Xq to 11 (Xa). Let ly be the distribution of Xq. Then 
the theorem follows. □ 

Remark: Though we have motivated our result in terms of <i> being the exact 
flow of a system of differential equations and $ being the trajectory generated by a 
numerical method, the result can be much more broadly applied. In particular, $ and 
$ can be the flow maps of any two dynamical systems on the same state space. 

It only remains to prove Theorem 14.31 which is the heart of Theorem 14. II above. 
Theorem 14.31 is similar to the Strassen-Dudley Theorem [21 [30] which we state here 
for reference: 

Theorem 4.2. (f^ p. 73]) Let {S,d) be a separable metric space. If X and Y 
are random elements of S with p{X, Y) < (3, then there are random elements X and 
Y of S defined on a common probability space such that X has the same distribution 
as X ,Y has the same distribution as Y and 

f(d{X,Y) >(3)<I3. 



□ 
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In contrast, our theorem is as follows. Recall that the distribution of X, a random 
element of S, is non-atomic when F{X ^ x) ~ Q for all x ^ S. 

Theorem 4.3. Let X be a random element of the separable complete metric 
space {S, d). Let Y be a random element of the separable complete metric space (T, e). 
Suppose that the distributions of both X and Y are non-atomic. Let (S, d) be another 
separable complete metric space, and let Hi: S ^ S and II2: T ^ S be measur- 
able maps. Let p denote the Prokhorov metric on random elements of {S,d). If 
p{Ili{X),Il2{Y)) < (3 then there is a measurable map ip : S T such that Y = ipX 
has the same distribution as Y and 

P(d(ni(x),n2(r)) >/?)</?. (4.1) 

Taking {S, d) = (5, d) = (T, e) and Hi and 112 to be the identity gives the following 
simple corollary that is easier to compare with Theorem 14.21 

Corollary 4.4. Let {S, d) be a separable complete metric space. Let X and Y 
be random elements of S with non-atomic distributions. If p(X,Y) < P, then there is 
measurable map ip from {S, d) to itself such that Y = -j/jX has the same distribution 
as Y and 

W{d{X,Y) > 13) <p. 

This result is stronger than Theorem 14.21 in that instead of being forced to have 
a new probability space and define two new random variables X and Y, we are able 
to leave X as it is and define y to be a random variable on the same space that X 
is defined on. This extra strength is necessary for us in order to make the connection 
with shadowing in Theorem 14.11 The extra cost is that we assume that the metric 
spaces in which X and Y live are complete and, more importantly, that the probability 
distributions of X and Y are non-atomic. 

To see that the assumption of non-atomicity is essential in Corollarv l4.4|, and hence 
in Theorem 14. 3( consider the following example. For any e G (0, 1/2) let = {0, 1} 
and let be all the subsets of il. Let the probability measure F be defined by 
P(0) = 1/2 - e, P(l) = 1/2 + 6. Then {Q, T, P) is a probabihty space. Define random 
variables X and Y by X{0) = 0,X{1) = 1,Y{0) = 1,Y{1) = 0. It is straightforward 
to check that p{X,Y) — e. However, Y is the only random variable on (fJ, JF, P) that 
has the same distribution as F, and V{\X -Y\> 1/2) = 2e. So V{\X -Y\> e) > e, 
and the result cannot hold. 

Now we turn to the proof of Theorem 14.31 Many of the ideas used in the proof 
of the Strassen-Dudley Theorem reappear here, including the use of the Marriage 
Lemma. Before the proof we need several lemmas that allow us to construct measure 
isomorphisms between various spaces and also to divide spaces into many pieces of 
equal measure. 

Lemma 4.5. '25, p. 327] Let {S,d) be a complete separable metric space with 
Borel a-algebra B{S). Let fi be a non-atomic probability measure on {S,B{S)). Let 
([0,1], S([0,1]), A) be the unit interval with Lebesgue measure defined on the Borel 
sets. Then there is a subset Sq of S with fii^So) — IJ-{S) and subset Lq of [0, 1] with 
A(Lo) = A([0, 1]) such that there is a measurable invertible ip : Sq Lq such that 
nlip-^A) = X{A) for all A G S([0, 1]) n Lq. □ 

Lemma 4.6. Let {S,d) and (T, e) be two complete separable metric spaces with 
Borel a-algebras B{S) and B{T) respectively. Let fis o,nd /iy be non-atomic probability 
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measures on {S,B{S)) and {T,B{T)) respectively. Then there is a measurable map 
ip: S —>■ T such that 

fisii^^^A)^ fiT{A) 

for all A G B{T). Proof. Use Lemma 14.51 to construct subsets of full measure 
So and Tq in S and T with measure preserving invertible maps ipi : So ^ Li and 
'<p2 - To L2. Let So = i^i^Li n V'f and let ip equal restricted to this set. 

Let To be the image of 5*0 under ip. Now extend ip in an arbitrary measurable way to 
all of S. U 

Lemma 4.7. Let {S,d) be a complete separable metric space with Borel a-algebra 
B{S). Let 11 be a non-atomic probability measure on {S,B). Given any subset T G 
B{S) with ^{T) ~ m < 1 and any finite set of real numbers nii, i — I,. . . ,n with 
X^i Tni — m there is a partition of T 

T = ur=iT, 

with fJ-(Ti) — rui, i — 1, . . . ,n. 

Proof. We begin by proving the result for the {S, d) being the unit interval [0, 1] 
and ^ Lebesgue measure. Then we use the previous lemma to establish the general 
case. 

Let T be a Borel subset of [0, 1] with Lebesgue measure m. Consider the function 
(T : [0, 1] [0, m] defined by 

a{x) = A(rn [0,a;]). 

Since A is non-atomic a is continuous. Let mp = and fhi — '^j- Let Ti = 

(T^^([mi_i, m^)) for 1 < i < n — 1 and r„ — (T^^([m„_ir?i„]). It is straightforward to 
show that the Ti satisfy the required conditions. 

For the case of general {S,d), let tp, So, Lq be as given by Lemma 1431 The 
subset of [0, 1] given by tjj{T n So) has measure m. By the previous case this can 
be partitioned into n subsets Ri with X{Ri) = rui that are all subsets of Lg. Let 
T, = for 1 < i < n - 1 and r„ = i^^^iRn) U (T \ 5*0). It is straightforward 

to show that Ti have the required properties. □ 

Lemma 4.8. Let {S,d) be a complete separable metric space with non-atomic 
probability measure fi on it. Let (S, d) be another complete separable metric space and 
U: S S a measurable map. For any e > there is a 6 < e and a finite partition of 
S 

S^S*U (U^=i5fc), 

.such that 

(i) fJ.{Sk) = S for all k, 

(ii) diam(n(5fe)) < e for all k, 
(ill) fi{S*) < e. 

Moreover, this is possible for all sufficiently small S. 

Proof. Let z^, i > 1 be a dense sequence of points in S. Let Bi C S he the inverse 
image under 11 of the open ball of radius e/2 about Xi in S. For i > 1 let 



Bi = Bi \ ^^jJiBj. 
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Then the are disjoint, diam(n(i?i)) < e and have union S. Choose m such that 

Y,^^iB^) > l-e/2. 

Now choose a sufficiently small S so that m(5 < e/2. Using Corollary 14.71 divide each 
Bi into ki sets -B^j with fj,{Bij) — S and one additional set B* with IJ.(B*) < S. Now 
there are finitely many sets Bij all having measure 6. Call these sets Sk- Condition 
(i) is then satisfied. Each has diameter less than e, since each is a subset of some Bi. 
So condition (ii) is satisfied. If we let S* = S \ UjJ^j^S'fe we have 

Cm \ m 

s\Y,fi{B,)\+j2fi{B:) 

< e/2 + m(5 = e, 

showing that condition (iii) is satisfied. □ 

Proof of Theorem Let a = p{Ui{X),Il2{Y)) < 13. Consider any e > 0, which 
we will fix later to obtain the required result. 

Use Lemma 2^ to construct finite partitions of S and T, 



such that 



and for all i 



S = S*U (Ur^i5,), T = T*U (U^Li^,), 

v{x e s*) < e, P(y e t*) < e, 

diam(ni(S'i)) < e, diam(n2(Ti)) < e, 



F{X E Si) = P(y eTi)^S <e. 

We can use the same S for both S and T since Lemma 14.81 shows that for each e the 
construction is possible for all sufficiently small S. 

We will construct a 1-1 mapping (j> from the set {1, . . . ,n} to itself such that for 
most i we have d{Ili(Si),Il2{T^(^i))) < a + e. In other words, for most i there will be a 
point in ni(S'i) and a point in Il2{T^(^i^) that are within distance a + e of each other. 
Based on (p, we will then use Lemma 14.51 to construct a map ip on S that takes Si to 
T^{i) and such that ipX has the same distribution as Y . We will construct the map (j) 
with the help of the Marriage Lemma of Konig and Hall [3] ■ 

Lemma 4.9. (See |3 p. 406].) Let K denote a relation on {1, . . . ,n} such that 
for all subsets A o/ {1, . . . , n} 

\{j eA: iKj for some i e A}\ > \A\ (4.2) 

where \ ■ \ denotes cardinality. Then there is a 1-1 mapping (p of {1, . . .,n} to itself 
such that iK^^i^^ for all i. □ 

Ideally we would define the relation i^T on { 1 , . . . , n} by saying that iKj if d{lii (5^ ) , 112 [Tj ) ) 
< a + e, and then using the Marriage Lemma to construct a mapping (p such that 
d(Jii{Si), n2(T0(j))) < a + e for all i. However, in general (|4.2[) does not hold for this 
definition of if, and such a (p does not exist. 

Instead, we construct a map (p such that d{Jli{Si).,Ii2{Ttjj(i))) < a + e only for 
most i, as follows. We append to {1, . . . , n} k extra indices n + 1, . . . , n + fc. Now for 
i, j e {1, . . . , n + fc} we say that iKj if either 
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1. (i(ni(5,),n2(T,)) <a + e, 

2. i > n, or 

3. j > n. 

Now let A be a subset of {1, . . . , n + fc}. Either A contains at least one of n + 
1, . . . , n + fc or it doesn't. In the former case (|4.2p holds immediately. In the latter 
case, let Sa — HieASi. We have that 

P{X e Sa) = \A\5. 

Let 

B^{zeS: d(ni{SA),Ii2{z)) < a + e}, 

so that 

¥{Y eB)=¥{d{ni{SA),ll2iY))<a + e) 
>F{d{ni{SA),n2{Y))<a) 

= p(n2(r) eni(5^r) 

Then since p{ni{X),U2{Y)) = a, 

p(ni(x) e ii,{Sa)) < nn^iY) e u^iSAr) + «. 

This fact yields 

p(y eB)> P(ni(x) e 01(5^)) -a> ¥{X eSA)-a^ \A\6 - a. 
So the number of sets Tj that have some portion in B is at least 
{\A\S -a~P{Y e T*))/S > \A\ ~{a + e)/S. 

For all these j < n there is some i G A such that iKj. 

When we include all the j > n, the total number of j such that iKj for some 
i E A is then at least |^| — {a + e)/S + k. So if we let k = \{a + e)/(5], and then 
relation K on {1, . . . ,n + k} satisfies the conditions of the Marriage Lemma. 

This gives us that there is a 1-1 map (j) between {1, . . . ,n + k} and itself such that 
iK^j^.j^-^ for all i. We want to get a map 1-1 map on {1, . . . , 71} so that iK^(^i) for most 
i. Consider what (f> does to the set {1, . . . , n}. At least n — k elements get mapped 
back to {1, . . . , n}. Let 4>{i) — <j>{i) for these elements. For all the others, just let 4>{i) 
be extended to be 1-1 on {1, ... , n}. 

Now we have an invertible map on {1, . . . , n} such that for n — A: of the elements 

i 

J(ni(^,),n2(r^(,))) <a + e. 

Using Lemma |4.6[ for each i there is a map ^pi : Si — > T^(i) such that for any measur- 
able subset C of Si 

F{X eC) =F{Y eMC))- 

Again use Lemma HH] to construct a map -0* : S"* — > T* such that ¥{X E C) = ¥(Y E 
ip^,{C)) for all measurable C C S* . Now let -0: S" ^ T be defined by requiring that 
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ip restricted to Si is ipi and that ^ restricted to S* is tp^:. Then for any measurable 
subset C of S 

V{X e C) = P(y e ipC). 
This means that if we let Y = ipX, for any measurable subset D oi S 

F{Y ED) = P(V'X eD)^ ¥{x e ij^^D) = ¥{Y £ ijj^r^D) = P(r e D) 

as required. 

It remains to show that Y satisfies equation (|4.ip . Now for n — k indices i 

J(ni(^,),n2(T^(.))) <a + e. 

liX eSi then Y e r0(,) and 

J(ni(X),n2(y)) < a + e + 2e, 

since ni(S'i) and n2(T^(j)) have diameters smaller than e. So with probability at least 
5{n - k) we have that d{\li{X) ,n2{Y)) < a + 3e. Since 1 = n(5 + ¥{X e 5*) and 

P(X e 5*) < e, 

P(J(ni(X),n2(r)) >a + 3e) <A:(5 + e = (5[(a + e)/(5] + (5 < a + e + 2^ < a + 3e. 

Choosing e so that a + ie < (3 then gives our result. □ 

5. Discussion. Suppose we are considering a particular molecular dynamics sim- 
ulation over a long time interval [0, T] started from random initial conditions. We wish 
to determine what statistical features of its trajectories are computed reliably. A sim- 
ple baseline conjecture would be that all statistical features of the trajectories are 
computed accurately. As we have detailed above, a quantitative version of this con- 
jecture would be to say that Xa*) = e for some small e > 0. Then Theorem ll.il 
states that with probability greater than 1 — e, the numerical trajectory is shadowed 
by an exact trajectory to within error e. In this case, it should be possible to detect 
shadow trajectories numerically using the techniques of [17], even though the shad- 
ows computed will not necessarily have the correct measure on their initial conditions. 
Find such shadow trajectories would be partial confirmation that p{X, Xai) is small. 

The other possibility is that p{X, XAt) is not small. Suppose instead that p{X, Xai) > 
1/2. Now (|331) implies that P{X,XAt) > 1/6. This means that 

||EG(X) -EG(XAt)ll > 1/6 (5.1) 

for some G: (C[0,T])™ K with ||G||bi,. Hence one way to confirm that p{X,XAt) 
is large is to find such a G such that we empirically observe (|5.1[) . 

In principle this approach is reasonable, but in practice the space of all functions 
G: (G[0,T])'" — > R with ||G||bl = 1 is huge for a realistic molecular dynamics simu- 
lation. In practice it may be that only for very unusual functions G do EG(X) and 
EG(XAt) disagree significantly. One practical way to approach this is to start with 
very low-dimensional systems. The lowest dimensional system that is a reasonable 
model of molecular dynamics consists of two particles on a two-dimensional periodic 
domain [3T]. For example, one could study the system we considered in Section [5] 
but with only two particles. Using the software available to compute exact shadow 
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trajectories of numerical trajectories |17 , would show where shadowing is not possible 
and could suggest what functions G are likely candidates. 

We have been considering the case where 11 is the identity, for which it may be 
that p(n(X), Ii{X^t)) is large. The other direction to study these systems is to choose 
a n that is a very low dimensional function of the state of the system and then see 
if it is possible to numerically perform weak shadowing with this choice. Currently 
there have not been algorithms developed to do this, but it is a direction for future 
work. 

Finally, besides these numerical/experimental approaches there are more analyti- 
cal approaches. These would involve studying one of the model systems for molecular 
dynamics available that are analytically tractable. Both the systems studied in [T9] 
and [5T] consist of single particle coupled to a bath of a very large or infinite number 
of smaller particles. In both cases it is shown that the distribution of the path of the 
large particle converges to a stochastic process that can be simply described through 
low-dimensional stochastic differential equations. These situations are clean enough 
that similar results for the numerical discretization of the Hamiltonian equation may 
be possible, thus establishing approximation in distribution for the path of the dis- 
tinguished particle. Our result then allows us to conclude that weak shadowing is 
possible. 

Our work allows the possibility of studying the reliability molecular dynamics in 
a variety of contexts from one of two directions: either the statistical one through the 
computing of histograms, or the dynamical one through the computing of shadowing 
trajectories. 
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